Enhancement Of A Chinese Discourse Marker Tagger With C4.5
نویسندگان
چکیده
Discourse markers are complex discontinuous linguistic expressions which are used to explicitly signal the discourse structure of a text. This paper describes efforts to improve an automatic tagging system which identifies and classifies discourse markers in Chinese texts by applying machine learning (ML) to the disambiguation of discourse markers, as an integral part of automatic text summarization via rhetorical structure. Encouraging results are reported.
منابع مشابه
Fine-Grained Chinese Discourse Relation Labelling
This paper explores several aspects together for a fine-grained Chinese discourse analysis. We deal with the issues of ambiguous discourse markers, ambiguous marker linkings, and more than one discourse marker. A universal feature representation is proposed. The pair-once postulation, cross-discourse-unit-first rule and word-pair-marker-first rule select a set of discourse markers from ambiguou...
متن کاملTopic Identification In Chinese Based On Centering Model
In this paper we are concerned with identifying the topics of sentences in Chinese texts. The key elements of the centering model of local discourse coherence are employed to identify the topic which is the most salient element in a Chinese sentence. Due to the phenomenon of zero anaphora occurring in Chinese texts frequently, in addition to the centering model, we further employ the constraint...
متن کاملAcquisition of the perfective aspect marker Le of Mandarin Chinese in discourse by American college learners
Approved: ________________________ Thesis Supervisor ________________________ Title and Department ________________________ Date ACQUISITION OF THE PERFECTIVE ASPECT MARKER LE OF MANDARIN CHINESE IN DISCOURSE BY AMERICAN COLLEGE LEARNERS
متن کاملSentence Classification Experiments for Legal Text Summarisation
We describe experiments in building a classifier which determines the rhetorical status of sentences. The research is part of a text summarisation project for the legal domain and we use a newly compiled and annotated corpus of judgments of the UK House of Lords. Rhetorical role classification is an initial step which provides input to the sentence selection component of the system. We report r...
متن کاملDisambiguating potential connectives
Many discourse connectives also have nondiscourse, or sentential readings. Therefore, for automatic discourse structure analysis, there arises a disambiguation problem even before the question of signalled discourse relation beomes relevant. We focus here on a set of nine German connectives and characterize the task of determining their discourse/sentential reading. Starting from an analysis of...
متن کامل